fix: resolve FixedWindowCallRatePolicy deadlock on 429 without ratelimit-reset header#985
Draft
devin-ai-integration[bot] wants to merge 1 commit intomainfrom
Draft
Conversation
…mit-reset header Four interacting bugs caused connectors using FixedWindowCallRatePolicy to deadlock when a 429 response arrived from an API that lacks a ratelimit-reset header: 1. next_reset_ts initialized 10 days in the future instead of now + period 2. get_reset_ts_from_response() never fell back to retry-after header 3. _update_current_window() only advanced by one period instead of catching up 4. _do_acquire() had no upper bound on sleep duration Fixes: - Initialize next_reset_ts to now + period in model_to_component_factory.py - Fall back to retry-after header in get_reset_ts_from_response() - Use while loop in _update_current_window() to advance past all elapsed periods - Cap maximum sleep in _do_acquire() to 600 seconds with a warning log Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1775687146-fix-fixed-window-deadlock#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1775687146-fix-fixed-window-deadlockPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a deadlock in
FixedWindowCallRatePolicythat caused connectors to hang for ~10 days (until the heartbeat monitor killed them) when a 429 response arrived from an API that lacks aratelimit-resetheader.The deadlock was caused by four interacting bugs:
next_reset_tsinitialized 10 days in the future (model_to_component_factory.py): Changed tonow + periodinstead ofnow + 10 days.retry-afterheader ignored (call_rate.py):get_reset_ts_from_response()now falls back to the standardretry-afterheader (interpreted as seconds) whenratelimit-resetis absent._update_current_window()single-step advance (call_rate.py): Changedif→whileso the window catches up past all elapsed periods, not just one.call_rate.py):_do_acquire()now caps sleep to 600 seconds with a warning log.Resolves https://github.com/airbytehq/oncall/issues/11924:
Review & Testing Checklist for Human
whileloop safety in_update_current_window(): Verify that_offset(the period) can never be zero or negative, which would cause an infinite loop. It comes fromparse_duration(model.period)— confirm this always returns a positive timedelta.retry-afterheader parsing: The implementation only handles integer-seconds format, not the HTTP-date format allowed by RFC 7231. Verify this is acceptable for the APIs Airbyte connectors interact with, or if HTTP-date parsing should be added.Suggested test plan: Run a connector that uses
FixedWindowCallRatePolicyagainst an API that returns 429 withoutratelimit-resetheaders (or mock one) and verify the connector retries within seconds rather than hanging indefinitely.Notes
Link to Devin session: https://app.devin.ai/sessions/6e4ee57d58264eca9e7f33e431983f9c